Skip to content

Conversation

@hahnjo
Copy link
Member

@hahnjo hahnjo commented Dec 1, 2025

When IMT is turned on and RPageSinkBuf has an RTaskScheduler, we would previously buffer all pages and create tasks to seal / compress them. While this exposes the maximum work, it's a waste of memory if other threads are not fast enough to process the tasks. Heuristically assume that there is enough work if we already buffer more uncompressed bytes than the approximate zipped cluster size.

In a small test, writing random data with ROOT::EnableImplicitMT(1) and therefore no extra worker thread, the application used 500 MB before this change for the default cluster size of 128 MiB. After this change, memory usage is reduced to around 430 MB (compared to a memory usage of 360 MB without IMT). The compression factor is around ~2.1x in this case, which roughly checks out:
Instead of buffering the full uncompressed cluster (which is around compression factor * zipped cluster size = 270 MiB), we now buffer uncompressed pages up to the approximate zipped cluster size (128 MiB) and then start compressing pages immediately. The result of course also needs to be buffered, but is much smaller after compression: ((1 - 1 / compression factor) * zipped cluster size = 67 MiB). Accordingly, the gain will be higher for larger compression factors.

Closes #18314, backport of #20425

FYI @Dr15Jones @makortel

Created tasks reference *this, so moving is not safe. It's also not
needed because RPageSinkBuf is always inside a std::unique_ptr.

(cherry picked from commit 672dc1a)
When IMT is turned on and RPageSinkBuf has an RTaskScheduler, we
would previously buffer all pages and create tasks to seal / compress
them. While this exposes the maximum work, it's a waste of memory
if other threads are not fast enough to process the tasks.
Heuristically assume that there is enough work if we already buffer
more uncompressed bytes than the approximate zipped cluster size.

In a small test, writing random data with ROOT::EnableImplicitMT(1)
and therefore no extra worker thread, the application used 500 MB
before this change for the default cluster size of 128 MiB. After
this change, memory usage is reduced to around 430 MB (compared to
a memory usage of 360 MB without IMT). The compression factor is
around ~2.1x in this case, which roughly checks out:
Instead of buffering the full uncompressed cluster (which is around
compression factor * zipped cluster size = 270 MiB), we now buffer
uncompressed pages up to the approximate zipped cluster size (128 MiB)
and then start compressing pages immediately. The result of course
also needs to be buffered, but is much smaller after compression:
((1 - 1 / compression factor) * zipped cluster size = 67 MiB).
Accordingly, the gain will be higher for larger compression factors.

(cherry picked from commit c421df1)
@hahnjo hahnjo requested a review from jblomer December 1, 2025 16:41
@hahnjo hahnjo self-assigned this Dec 1, 2025
@hahnjo hahnjo requested a review from pcanal as a code owner December 1, 2025 16:41
@makortel
Copy link

makortel commented Dec 1, 2025

Thanks!

@github-actions
Copy link

github-actions bot commented Dec 2, 2025

Test Results

    17 files      17 suites   2d 20h 26m 58s ⏱️
 2 749 tests  2 748 ✅ 0 💤 1 ❌
45 154 runs  45 153 ✅ 0 💤 1 ❌

For more details on these failures, see this check.

Results for commit a8721ba.

♻️ This comment has been updated with latest results.

@hahnjo hahnjo merged commit 8a82325 into root-project:v6-36-00-patches Dec 4, 2025
51 of 60 checks passed
@hahnjo hahnjo deleted the ntuple-imt-mem-v636 branch December 4, 2025 07:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants